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Abstract —We consider the problem of recovering two- 
dimensional (2-D) block-sparse signals with unknown cluster 
patterns. Two-dimensional block-sparse patterns arise naturally 
in many practical applications such as foreground detection and 
inverse synthetic aperture radar imaging. To exploit the block- 
sparse structure, we introduce a 2-D pattern-coupled hierarchical 
Gaussian prior model to characterize the statistical pattern 
dependencies among neighboring coefficients. Unlike the conven¬ 
tional hierarchical Gaussian prior model where each coefficient 
is associated independently with a unique hyperparameter, the 
pattern-coupled prior for each coefficient not only involves 
its own hyperparameter, but also its immediate neighboring 
hyperparameters. Thus the sparsity patterns of neighboring 
coefficients are related to each other and the hierarchical model 
has the potential to encourage 2-D structured-sparse solutions. 
An expectation-maximization (EM) strategy is employed to obtain 
the maximum a posterior (MAP) estimate of the hyperpa¬ 
rameters, along with the posterior distribution of the sparse 
signal. In addition, the generalized approximate message passing 
(GAMP) algorithm is embedded into the EM framework to 
efficiently compute an approximation of the posterior distribution 
of hidden variables, which results in a significant reduction in 
computational complexity. Numerical results are provided to 
illustrate the effectiveness of the proposed algorithm. 

Index Terms —Pattern-coupled sparse Bayesian learning, 
block-sparse structure, expectation-maximization (EM), gener¬ 
alized approximate message passing (GAMP). 


I. Introduction 

Compressed sensing is a recently emerged technique for 
signal sampling and data acquisition which enables to recover 
sparse signals from undersampled linear measurements 

y = Ax + w (1) 

where A G R MxAr is a sampling matrix with M <C N, 
x denotes an iV-dimensional sparse signal, and w denotes 
the additive noise. The problem has been extensively studied 
and a variety of algorithms, e.g. the orthogonal matching 
pursuit (OMP) algorithm [1], the basis pursuit (BP) method 
[2], the iterative reweighted I\ and £2 algorithms [3], and the 
sparse Bayesian learning method [4]-[6] were proposed. In 
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many practical applications, in addition to the sparse structure, 
sparse signals may exhibit two-dimensional cluster patterns 
that can be utilized to enhance the recovery performance. 
For example, the target of interest in the synthetic aperture 
radar/inverse synthetic aperture radar (SAR/ISAR) images 
often demonstrates continuity in both the range and cross¬ 
range domains [7]. In video surveillance, the foreground image 
exhibits a cluster pattern since the foreground objects (humans, 
cars, text etc.) generally occupy a small continuous region of 
the scene [8]. Besides these, block-sparsity is also present in 
temporal observations of a time-varying block-sparse signal 
whose support varies slowly over time [9]. 

Analyses [10]—[12] show that exploiting the inherent block- 
sparse structure not only leads to relaxed conditions for 
exact reconstruction, but also helps improve the recovery 
performance considerably. A number of algorithms have been 
proposed for recovering block-sparse signals over the past few 
years, e.g., block-OMP [12], mixed G/^i norm-minimization 
[10], group LASSO [13], model-based CoSaMP [11], and 
block-sparse Bayesian learning [14], [15]. These algorithms, 
however, require a priori knowledge of the block partition (e.g. 
the number of blocks and location of each block) such that the 
coefficients in each block are grouped together and enforced 
to share a common sparsity pattern. In practice, the prior 
information about the block partition of sparse signals is often 
unavailable, especially for two-dimensional signals since the 
block partition of a two-dimensional signal involves not only 
the location but also the shape of each block. For example, 
foreground images have irregular and unpredictable cluster 
patterns which are very difficult to be estimated a priori. To 
address this difficulty, a few sophisticated Bayesian methods 
which do not need the knowledge of the block partition 
were developed. In [16], a “spike-and-slab” prior model was 
proposed, where by introducing dependencies among mixing 
weights, the prior model has the potential to encourage sparsity 
and promote a tree structure simultaneously. This “spike-and- 
slab” prior model was later extended to accommodate block- 
sparse signals [7], [17]. Nevertheless, for the “spike-and-slab” 
prior introduced in [16], [17], the posterior distribution cannot 
be derived analytically, and a Markov chain Monte Carlo 
(MCMC) sampling method has to be employed for Bayesian 
inference. In [18], [19], a graphical prior, also referred to as the 
“Boltzmann machine”, is employed as a prior on the sparsity 
support in order to induce statistical dependencies between 
atoms. With such a prior, the maximum a posterior (MAP) es¬ 
timator requires an exhaustive search over all possible sparsity 


2 


patterns. To overcome the intractability of the combinatorial 
search, a greedy method [18] and a variational mean-field 
approximation method [19] were developed to approximate 
the MAP. In [20], to cope with the unknown cluster pattern, 
an expanded model is employed by assuming that the original 
sparse signal is a superposition of a number of overlapping 
blocks, and the coefficients in each block share the same 
sparsity pattern. Conventional block sparse Bayesian learning 
algorithms such as those in [15] can then be applied to the 
expanded model. 

Recently in [21], we proposed a pattern-coupled hierarchical 
Gaussian prior model to exploit the unknown block-sparse 
structure. Unlike the conventional hierarchical Gaussian prior 
model [4]-[6] where each coefficient is associated indepen¬ 
dently with a unique hyperparameter, the pattern-coupled prior 
for each coefficient not only involves its own hyperparameter, 
but also its immediate neighboring hyperparameters. This 
pattern-coupled hierarchical model is effective and flexible 
to capture any underlying block-sparse structures, without 
requiring the prior knowledge of the block partition. Numerical 
results show that the pattern-coupled sparse Bayesian learning 
(PC-SBL) method renders competitive performance for block- 
sparse signal recovery. Nevertheless, a major drawback of the 
method is that it requires computing an N x N matrix inverse 
at each iteration, and thus has a cubic complexity in terms of 
the signal dimension. This high computational cost prohibits 
its application to problems with even moderate dimensions. 
Also, [21] only considers recovery of one-dimensional block- 
sparse signals. In this paper, we generalize the pattern-coupled 
hierarchical model to the two-dimensional (2-D) scenario 
in order to leverage block-sparse patterns arising from 2-D 
sparse signals. To address the computational issue, we resort 
to the generalized approximate message passing (GAMP) 
technique [22] and develop a computationally efficient method. 
Specifically, the algorithm is developed within an expectation- 
maximization (EM) framework, using the GAMP to efficiently 
compute an approximation of the posterior distribution of 
hidden variables. The hyperparameters associated with the 
hierarchical Gaussian prior are learned by iteratively maximiz¬ 
ing the Q-function which is calculated based on the posterior 
approximation obtained from the GAMP. Simulation results 
show that the proposed method presents superior recovery 
performance for block-sparse signals, meanwhile achieving a 
significant reduction in computational complexity. 

The rest of the paper is organized as follows. In Section 
HD we introduce a 2-D pattern coupled hierarchical Gaus¬ 
sian framework to model the sparse prior and the pattern 
dependencies among the neighboring coefficients. In Section 
ED a GAMP-based EM algorithm is developed to obtain the 
maximum a posterior (MAP) estimate of the hyperparameters, 
along with the posterior distribution of the sparse signal. 
Simulation results are provided in Section |IV] followed by 
concluding remarks in Section [V] 

II. Bayesian Model 

We consider the problem of recovering a two-dimensional 
block-sparse signal X £ K.^ xi from compressed noisy 


measurements 

y = f(X)+w (2) 

where y £ E A/ denotes the compressed measurement vector, 
/(•) is a linear map: K < 3 xi —> R M , with M <C N = QL, 
and w £ R m is an additive multivariate Gaussian noise with 
zero mean and covariance matrix a 2 I. Let x = vec(X), the 
linear map f(X) can generally be expressed as 

f(X) = Ax (3) 

where A £ M. AIxN denotes the measurement matrix. In 
the special case where f(X) = vec (BX), then we have 
A = I®B, in which (g> stands for the Rronecker product. The 
above model Q arises in image applications where signals are 
multi-dimensional in nature, or in the scenario where multiple 
snapshots of a time-varying sparse signal are available. In these 
applications, signals usually exhibit two-dimensional cluster 
patterns that can be utilized to improve the recovery accuracy. 
To leverage the underlying block-sparse structures, we intro¬ 
duce a 2-D pattern-coupled Gaussian prior model which is a 
generalization of our previous work [21]. Before proceeding, 
we provide a brief review of the conventional hierarchical 
Gaussian prior model [4], and some of its extensions. 

A. Review of Conventional Gaussian Prior Model 

For ease of exposition, we consider the prior model for 
the two-dimensional signal X instead of its one-dimensional 
form x. Let x q j denote the (q, /)th entry of X. In the 
conventional sparse Bayesian learning framework [4], a two- 
layer hierarchical Gaussian prior was employed to promote the 
sparsity of the solution. In the first layer, coefficients {x q i } 
of X are assigned a Gaussian prior distribution 
Q L 

p(X\ a ) = Hl[X(x q , l \0 1 a-}) (4) 

9=1 1=1 

where o q j is a non-negative hyperparameter controlling the 
sparsity of the coefficient x q j. The second layer specifies 
Gamma distributions as hyperpriors over the hyperparameters 

a ~ i a qA’ i' e - 

Q L 

p(a) = nil Gamma(a gi j|a, b) (5) 

9=1 1=1 

As discussed in [4], for properly chosen a and b, this hy¬ 
perprior allows the posterior mean of a q> i to become ar¬ 
bitrarily large. As a consequence, the associated coefficient 
x q j will be driven to zero, thus yielding a sparse solu¬ 
tion. This conventional hierarchical model, however, does not 
encourage structured-sparse solutions since the sparsity of 
each coefficient is determined by its own hyperparameter and 
the hyperparameters are independent of each other. In [14], 
[15], the above hierarchical model was generalized to deal 
with block-sparse signals, in which a group of coefficients 
sharing the same sparsity pattern are assigned a multivariate 
Gaussian prior parameterized by a common hyperparameter. 
Nevertheless, this model requires the knowledge of the block 
partition to determine which coefficients should be grouped 
and assigned a common hyperparameter. 
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B. Proposed 2-D Pattern-Coupled Plierarchical Model 

To exploit the 2-D block-sparse structure, we utilize the 
fact that the sparsity patterns of neighboring coefficients are 
statistically dependent. To capture the pattern dependencies 
among neighboring coefficients, the Gaussian prior for each 
coefficient x q j not only involves its own hyperparameter o q j, 
but also its immediate neighbor hyperparameters. Specifically, 
a prior over X is given by 

Q L 

p(x\a )= nn^^i 0 ’^ 1 ) (6) 

q= 1 Z=1 

where 

S q ,i = a q ,z + /? ai ’i ( 7 ) 

in which 7V( g ^ denotes the neighborhood of the grid point 
{qj), i.e. N {q ^ = {(<?, l - 1 ), {q, l + 1 ), (q - 1 , l), (q + 1 , Z))GL 
and (3 £ [0,1] is a parameter indicating the pattern relevance 
between the coefficient x q j and its neighboring coefficients. 
Clearly, this model is an extension of our previous prior 
model [21] to the two-dimensional case. When (3 = 0, the 
prior model © reduces to the conventional sparse Bayesian 
learning model. When (3 > 0, we see that the sparsity of each 
coefficient x q j is not only controlled by the hyperparameter 
but also by the neighboring hyperparameters S 0q , = 
{ a i.j I (*i j) £ N {q ,i)}- The coefficient x q j will be driven to 
zero if a q j or any of its neighboring hyperparameters goes 
to infinity. In other words, suppose a q j approaches infinity, 
then not only its corresponding coefficient x q j will be driven 
to zero, the neighboring coefficients S x . = £ 

Niqj)} will decrease to zero as well. We see that the sparsity 
patterns of neighboring coefficients are related to each other 
through their shared hyperparameters. On the other hand, for 
any pair of neighboring coefficients, each of them has its own 
hyperparameters that are not shared by the other coefficient. 
Hence, no coefficients are pre-specified to share a common 
sparsity pattern, which enables the prior to provide flexibility 
to model any block-sparse structures. 

Following [4], we use Gamma distributions as hyperpriors 
over the hyperparameters {a q j }, i.e. 

Q L 

p(a) = nn Gamma(a gi ;|a, b) ( 8 ) 

q— 1 1=1 

where we set a > 1, and b = 10 -6 . The choice of a will be 
elaborated later in our paper. Also, the noise variance a 2 = 
I /7 is assumed unknown, and to estimate this parameter, we 
place a Gamma hyperprior over 7 , i.e. 

p{ 7 ) = Gamma( 7 |c, d) (9) 

where we set c = 1 and d = 10~ 6 . 

'Note that for edge grid points, they only have two or three immediate 
neighboring points, in which case the definition of changes accordingly. 


III. Proposed Algorithm 

We now proceed to perform Bayesian inference for the 
proposed pattern-coupled hierarchical model. The following 
model is considered since the linear map f(X) can be 
expressed as f(X ) = Ax 

y = Ax + w (10) 

We first translate the prior for the two-dimensional signal X 
to a prior for its one-dimensional form x. From ©. the prior 
over x can be expressed as 

N 

p(x\a) = Y[Af(x n \0,ri~ 1 ) ( 11 ) 

71 = 1 

where 

Vn =a n + (3 ^ OLi ( 12 ) 

i€JV( „) 

in which Nr „) denotes the neighbors of the point (q,l) on 
the two-dimensional grid, i.e. = {(/ — 2 )Q + q,lQ + 
q, (l — 1 )Q + q — 1, (l — 1 )Q + q + 1}@. The relation between 
n and (g, l) is given by n = (l — 1 )Q + q , that is, l = \n/Q~\, 
and q = n mod Q, in which [•] denotes the ceiling operator. 
Note that for notational convenience, we, with a slight abuse 
of notation, use x n to denote the ?ith entry of x and a n to 
denote the hyperparameter associated with the coefficient x n . 
Also, let a = {a n } since its exact meaning remains unaltered. 
From ([ 8 ]), we have 

N 

p(a) = n Gamma(a n |a, b) (13) 

71=1 

An expectation-maximization (EM) algorithm can be de¬ 
veloped for learning the sparse signal x as well as the 
hyperparameters {ck, 7 }. In the EM formulation, the signal x 
is treated as hidden variables, and we iteratively maximize a 
lower bound on the posterior probability p(a, x;\y) (this lower 
bound is also referred to as the Q-function). Briefly speaking, 
the algorithm alternates between an E-step and a M-step. In 
the E-step, we need to compute the posterior distribution of 
x conditioned on the observed data and the hyperparameters 
estimated from the 7th iteration, i.e. 

p(x\y, a^, 7 ^) oc p(x\a (t ^)p(y\x,'y l ' t ^) (14) 

It can be readily verified that the posterior p(x\y, a®, 7 W) 
follows a Gaussian distribution with its mean and covariance 
matrix given respectively by 

H (15) 

$ =( 1 ^A T APD)~ 1 (16) 

where D = diag(y^,..., rj$). The Q-function, i.e. 
E x \ y .(*(*), 7 (‘) [logger 71?/)]- can then be computed, where the 
operator E x | y , a (t), 7 (t) [•] denotes the expectation with respect 
to the posterior distribution p(x\y, a^, 7 ^). In the M-step, 

2 For edge grid points which have only two or three neighboring points, the 
definition of changes accordingly. 
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we maximize the Q-function with respect to the hyperparam¬ 
eters {a, 7}. 

It can be seen that the EM algorithm, at each iteration, 
requires to update the posterior distribution p(x\y, 
which involves computing an TV x TV matrix inverse. Thus 
the EM-based algorithm has a computational complexity of 
0(N 3 ) flops, and therefore is not suitable for many real-world 
applications involving large dimensions. In the following, we 
will develop a computationally efficient algorithm by resorting 
to the generalized approximate message passing (GAMP) 
technique [22], GAMP is a very-low-complexity Bayesian 
iterative technique recently developed [22] for obtaining an ap¬ 
proximation of the posterior distribution p(x\y, It 

therefore can naturally be embedded within the EM framework 
to replace the computation of the true posterior distribution. 
From GAMP’s point of view, the hyperparameters {«. 7} are 
considered as known. The hyperparameters can be updated in 
the M-step based on the approximate posterior distribution of 
x. We now proceed to derive the GAMP algorithm for the 
pattern-coupled Gaussian hierarchical prior model. 


A. Pattern-Coupled Hierarchical Gaussian GAMP 


GAMP was developed in a message passing-based frame¬ 
work. By using central-limit-theorem approximations, message 
passing between variable nodes and factor nodes can be greatly 
simplified, and the loopy belief propagation on the underlying 
factor graph can be efficiently performed. As noted in [22], 
[23], the central-limit-theorem approximations become exact 
in the large-system limit under i.i.d. zero-mean sub-Gaussian 
A. 

For notational convenience, let G = {a. 7} denote the 
hyperparameters. Firstly, GAMP assumes posterior indepen¬ 
dence among hidden variables {x n } and approximates the true 
posterior distribution p(x n \y, 0) by 


p{x n \y,r n ,T x ,G) 


p(x n \G)N(x n \r n ,T x ) 
f x p(x n \0)Af(x n \r n ,Tr) 


(17) 


where f n and t x are quantities iteratively updated during 
the iterative process of the GAMP algorithm. Here, we have 
dropped their explicit dependence on the iteration number k 
for simplicity. Substituting CD into CD. it can be easily 
verified that the approximate posterior p(x n \y, f n , t x , G) fol¬ 
lows a Gaussian distribution with its mean and variance given 
respectively as 


u = 
rn 


+X A 


(18) 

(19) 


Another approximation is made to the noiseless output z m = 
a^x, where aj n denotes the mth row of A. GAMP approxi¬ 
mates the true marginal posterior p(z m \y, 6) by 


p{zm\y,Pm,T^,e) 


P(ym\z m , 6)Af(Zm\Pm,TV (2Q) 
J z P(jjm |~m? G)J\f ( Z m | Pmi ^m) 


where p m and 7 )’, are quantities iteratively updated during 
the iterative process of the GAMP algorithm. Again, here we 


dropped their explicit dependence on the iteration number k. 
Under the additive white Gaussian noise assumption, we have 
P(y m \Zm,d) = Af(y m \z m , 1/7)- Thus P{Zm\y,PmiT?n, 0 ) 
also follows a Gaussian distribution with its mean and variance 
given by 


2 A + Pm 

1+7T& 


1+7 Hr 


( 21 ) 

( 22 ) 


With the above approximations, we can now define the 
following two scalar functions: gm(-) and g out (-) that are used 
in the GAMP algorithm. The input scalar function g m (-) is 
simply defined as the posterior mean p x [22], i.e. 


9in(r n ,Tn,0) = p x n 


fn 

1 + y n r x 


(23) 


The scaled partial derivative of T x gi n (f n ,T x ,G) with respect 
to r n is the posterior variance cf> x , i.e. 


n 1 /In 1 n 


(24) 


The output scalar function g 0ut (-) is related to the posterior 
mean as follows 


goul{Pm,T^G) — p (p m Pm) — 


1 ( T miy™ + Pm 
1 + 7 T n 


Pm 

(25) 


The partial derivative of g out (j5 m , r^, 6) is related to the 
posterior variance </>^ in the following way 


1 dp. 


~gout{Pmi T mi G) — 


Tm 


1 + 7 T% 


(26) 


Given the above definitions of g m (-) and g Qut (-), the GAMP 
algorithm tailored to the considered sparse signal estimation 
problem can now be summarized as follows (details of the 
derivation of the GAMP algorithm can be found in [22]), in 
which a m n denotes the (m, n)th entry of A, Pn(k) and ^(fc) 
denote the posterior mean and variance of x n at iteration k, 
respectively. 
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GAMP Algorithm 

Initialization: given 0^; set k = 0, s^ 1 '* = 0,Vm G 
{/<(fc)}£Li and {</£(fc)}£=i are initial¬ 
ized as the mean and variance of the prior distribution. 
Repeat the following steps until \Pn(k + 1) — 
u;Hfc)| 2 < e, where e is a pre-specified error tolerance. 
Step 1. Vm € {1,..., M}: 

n 

T^{k) =^a^ n ^(fe) 

n 

Pm{k) =z m (k) - T^(k)s m (k - 1) 

Step 2. Vm € {1,..., A/}: 

Sm(fc) =g 0 ut(Pm{k),T^{k),e (t) ) 

Tm( k ) = - J^-gout(Pm(k), T^(fe), 0 ( °) 
UVm 

Step 3. Vn G {1,..., TV}: 

<(*0 = 

^Vi(^) —A^n(^) Ei(^) ^ ^ ^mn^m(^) 

m 

Step 4. Vn G {1,..., iV}: 

^,(fc+ 1 ) =<(fc) <(*),©(*)) 

Output: {r n (fco),r^(feo)}, {p m (fco),rP (fc 0 )}, and 
{/r^(fc 0 + l),<^(fc 0 + 1)}, where fc 0 stands for the last 
iteration. 


We have now derived an efficient algorithm to obtain 
approximate posterior distributions for the variables x and 
z = Ax. We see that the GAMP algorithm no longer needs to 
compute the aforementioned matrix inverse. The dominating 
operations in each iteration is the simple matrix-vector multi¬ 
plications, which scale as O(MN). Thus the computational 
complexity is significantly reduced. In the following, we 
discuss how to update the hyperparameters via the EM. 


B. Hyperparameter Learning via EM 

Given the current estimates of the hyperparameters 
9^ = {aW, 7 ^} and the observed data y, the E- 
step computes the expected value (with respect to the 
hidden variables x) of the complete log-posterior of 
{a, 7 }, i.e. E x \ y aW r/( t) [logp(a, -f\x, y)], where the operator 
E x \y «(*), 7 W ['] denotes the expectation with respect to the 
posterior distribution p(x\y, 7 ^). This complete log- 
posterior is also referred to as the Q-function. The hyperparam¬ 
eters {a, 7 } are then estimated by maximizing the Q-function, 
i.e. 

{a^ t+1 \ 7V+ 1 )} = argmaxQ(o!, 7 |aW, 7 ^) 

OL, 7 

= argmaxE^i (t ) (t ) [logp(a, y\x, y)\ 

OL, 7 1 ' 

( 27 ) 


Since 


p(a, "/\x, y) tx p{cx)p(x\a)p(-y)p{y\x, 7 ) (28) 

the Q-function can be decomposed into a summation of two 
terms 

Q(a, 7 |a (t) , 7 W ) =E xlya{t ) tjW [logp(a)p(x\a)\ 

+ ^x|i/,a(‘), 7 W [logp(7)p(2/|®,7)] 
—Q(ot\ a^,7^) + Q{ 7|a (t) ,7 (t) ) (29) 


We observe that in the above Q-function, the hyperparameters 
a and 7 are separated from each other. This decouples the 
estimation of a and 7 into two independent optimization 
problems. We first examine the update of a, i.e. 

Q,(t+i) = argmaxQ(a|a^\ 7 ^) (30) 


Recalling (fill) — Q{ot\oS t \'y ( ' t ' > ) can be expressed as 

Q(a|a^, 7 ^) 

ocE f (a - 1) log a n - ba n + - logy n - -zVn(Xn) J (31) 

n=1 ' ' 


where (x 2 ) denotes the expectation with respect to 
the posterior distribution p(x\y, 0 ^, 7 ®). Here we use 
p(x n \y, r n (ko), T^(ko), 9^), i.e. the approximate posterior 
distribution of x n obtained from the GAMP algorithm, to 
replace the true posterior distribution p[x\y, (yy\ 7 ^) in 
computing the expectation. Since p{x n \y, r n (ko), r^(fco), 
follows a Gaussian distribution with its mean and variance 
given by (fl~ 8 l>—(fl9l>. we have 


«} = (, V ( f° ))a + (32 ) 

(1 + ijn r^(k 0 )) 2 1 + 1/n T^(ko) 

We see that in the Q-function (ED, hyperparameters are 
entangled with each other due to the logarithm term log?y ra 
(note that rj n , defined in (fTD . is a function of a). In this 
case, an analytical solution to the optimization (l30l > is difficult 
to obtain. Gradient descent methods can certainly be used 
to search for the optimal solution. Nevertheless, for gradient 
descent methods, there is no explicit formula for the hyperpa¬ 
rameter update. Also, gradient-based methods involve higher 
computational complexity as compared with an analytical 
update rule. Here we consider an alternative strategy which 
aims at finding a simple, analytical sub-optimal solution of 
(l30l >. Such an analytical sub-optimal solution can be obtained 
by examining the optimality condition of (l30l >. Suppose a* = 
{a*} is the optimal solution of (l30l >. Then the first derivative 
of the Q-function with respect to a equals to zero at the 
optimal point, i.e. 


dQ(a\9 w ) 


da r . 


= 0 (33) 

a* 2 2 


where 


E -4 

Un=(xl)+P E E) 

ieiv (n) 


(34) 


(35) 
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Since all hyperparameters {a n } and 8 are non-negative, it 
can be easily verified (1/a*) > (I/ 7 ?*) > 0, and (l//3a*) > 
(1 /r)*) > 0 for i G N( n y Therefore v* is bounded by 


— > < > 0 

ry* 


Consequently we have 
a + 1.5 


> 


a — 1 


1 , a — 1 
2 Vn > 


(36) 


(37) 


or or 

Combining (l33l> and (l37l) . we reach that a* is within the range 


a — 1 a + 1.5 
0.5 uj n + b ’ 0.5w n + b 


Vn 


(38) 


A sub-optimal solution to (l30l> can therefore simply be chosen 
as 


a 


(t+i) = 
n 


a — 1 
0.5w ra + b 


Vn 


(39) 


We see that the solution ( [39b provides a simple rule for the 
update of a. Also, notice that the update rule ( l39l > resembles 
that of the conventional sparse Bayesian learning work [4] 
except that ui n is equal to (x^) for the conventional sparse 
Bayesian learning method, while for our case, uj n is a weighted 
summation of ( x l) and ( x i) for i G N( n y Numerical results 
show that for a properly chosen value of a, this update 
rule, although sub-optimal, guarantees an exact recovery and 
provides superior recovery performance. Specifically, a = 1.5 
is empirically proven to be a robust choice which ensures that 
the proposed algorithm delivers stable recovery performance. 

We now discuss the update of the hyperparameter 7, the 
inverse of the noise variance. Since the GAMP algorithm also 
provides an approximate posterior distribution for the noiseless 
output z = Ax , we can simply treat z as hidden variables 
when learning the noise variance, i.e. 


7 (t+1) = argmax£; z | y a(t)i7 ( t )[logp( 7 )p(y|z, 7 )] 

= argmaxQ(7|a^,7^) (40) 

7 


Taking the partial derivative of the Q-function with respect to 
7 gives 


^7 


c — 1 


M 


M 1 2 \ 

d + — — — / v ({l/m ~ z m) ) 

• i 


(41) 


where (•) denotes the expectation with respect to 
p(z rn \y,p rn (k 0 ),TP l (k 0 ),9 {t ' > ), i.e. the approximate posterior 
distribution of z m . Recalling that the approximate posterior 
of z m follows a Gaussian distribution with its mean and 
variance given by (12Tb — d22l> . we have 

((Vm - Zm) 2 ) = {Vm - 9m) 2 + 4>m ( 42 ) 


where and (j) z m are given by (OTb (l22l). with [p rn . r^ ( } 
replaced by {p m (ko), r^(fco)}, and 7 replaced by Setting 
the derivative equal to zero, we obtain the update rule for 7 
as 


7 (t+i) = 


M + 2c - 2 

2d + — Z m ) 2 ) 


(43) 


So far we have completed the development of our GAMP- 
based pattern-coupled sparse Bayesian learning algorithm. 
For clarify, we now summarize our proposed PCSBL-GAMP 
algorithm as follows. 

PCSBL-GAMP Algorithm 

”7 Initialization: given a/ 0 -* and 7 ^ 0 \ 

2. For t > 0: given a- 1 ' 1 and 7 ^/ call the GAMP 
algorithm. Based on the outputs of the GAMP algo¬ 
rithm, update the hyperparameters a4 t+1 ) and 7 ( t+1 ) 
according to ( l39b and ( 143k 

3. Continue the above iteration until the difference be¬ 
tween two consecutive estimates of x is negligible, or 
a maximum number of iterations is reached. 


IV. Simulation Results 

We now carry out experiments to illustrate the performance 
of our proposed algorithm, also referred to as PCSBL-GAMP 
algorithm, and its comparison with other existing methods. 
The performance of the proposed algorithm^ will be examined 
using both synthetic and real data. The parameters a, b, c, d 
for our proposed algorithm are set equal to a = 1.5, b = 
10~ 6 , c = 1, and d = ICC 6 throughout our experiments. The 
relevance parameter /3 is set equal to 8 = 1 in our experiments. 
In fact, our empirical results suggest that its choice is not very 
critical to the recovery performance as long as [8 is set into 
the region /3 G [0.1,1], 

A. Synthetic Data 

We first evaluate the recovery performance of the PCSBL- 
GAMP method using the synthetic data. In our simulations, 
we generate a one-dimensional block-sparse signal in a way 
similar to [21]. Suppose the iV-dimensional sparse signal 
contains K nonzero coefficients (K is also denoted as the 
sparsity level) which are partitioned into T blocks with random 
sizes and random locations. The nonzero coefficients of the 
sparse signal x and the measurement matrix A G R MxJV 
are randomly generated with each entry independently drawn 
from a normal distribution, and then the sparse signal x and 
columns of A are normalized to unit norm. Fig. Q] depicts 
the success rate of the proposed PCSBL-GAMP method vs. 
the ratio M/N, where we set N = 200, K = 40 and 
T = 6. The success rate is computed as the ratio of the 
number of successful trials to the total number of independent 
runs. A trial is considered successful if the normalized squared 
recovery error \\x — x|| 2 /||a;|| 2 is no greater than 10~ 6 , where 
x denotes the estimate of the sparse signal. The success rates 
of the EM-based PC-SBL method [21] (referred to as the 
PCSBL-EM), the conventional SBL [4], and the basis pursuit 
(BP) method [2], [24] are also included for comparison. From 
Fig.cn we see that the PCSBL-GAMP method achieves almost 
the same performance as that of the PCSBL-EM method, and 
presents a significant performance advantage over the SBL 
and BP methods due to exploiting the underlying block-sparse 
structures. The average run times of respective algorithms as 

3 Matlab codes for our algorithm are available at 
http://www.junfang-uestc.net/codes/PCSBL-GAMP.rar 
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a function of the signal dimension TV is plotted in Fig. [2] 
where we set M = 0.47V. Results are averaged over 100 
independent runs. We see that the PCSBL-GAMP requires 
much less run time than the PCSBL-EM, particularly when 
the signal dimension TV is large. Also, it can be observed that 
the average run time of the PCSBL-EM grows rapidly with an 
increasing TV, whereas the average run time of the PCSBL- 
GAMP increases very slowly. This observation coincides with 
our computational complexity analysis. 



M/N 


Fig. 1. Success rates vs. the ratio M/N. 



Fig. 2. Average run times vs. N. 

We also carry out experiments using patches of letters 
“C” and “S” (16 x 16 pixels) with black background, where 
most of the pixels on the patches are zeros and the nonzero 
coefficients exhibit irregular block patterns. White Gaussian 
noise is added to the patches. We compare our method 
with other block-sparse signal recovery algorithms, namely, 
the cluster-structured MCMC algorithm (Cluss-MCMC) [17], 
the Boltzman machine-based greedy pursuit algorithm (BM- 
MAP-OMP) [18], and the block sparse Bayesian learning 
method (BSBL) method [15], [20]. Note that although the BM- 
MAP-OMP and the BSBL algorithms are developed for one¬ 
dimensional sparse signal recovery, we extend their methods 
to the two-dimensional scenario. In our simulations, model 


parameters used by the competing algorithms are adjusted to 
achieve the best performance. For the BSBL algorithms, the 
block size parameter h is set equal to 2. Fig. [3] depicts the 
original patches and the patches reconstructed by respective 
algorithms, where we set M = 80, and the signal-to-noise 
ratio (SNR) is set to 20dB with SNR = 10 log(|| Ax\\\/Mcr 2 ). 
It can be seen that our proposed PCSBL-GAMP method 
provides the best visual quality with recognizable letters, 
whereas the letters reconstructed by other algorithms have 
considerably lower quality, particularly for the Cluss-MCMC 
and the BM-MAP-OMP methods. This result also implies that 
our proposed method is flexible to accommodate any irregular 
cluster patterns. 

B. Satellite Image Recovery 

In this subsection, we carry out experiments on a non¬ 
negative 256 x 256 satellite imageB The image is sparse in 
the spatial domain, with only 6678 (approximately 10.2% of 
total pixels) nonzero pixels. In our experiments, compressive 
measurements are corrupted by additive i.i.d. Gaussian noise, 
i.e. y = Ax + w, where the image is represented as a one¬ 
dimensional vector x. The sensing matrix A is chosen to 
be the same as that used in [25], i.e. A = where 

$ € {0, l} MxAr and its rows are randomly selected from the 
TV x TV identity matrix, e { —l,l} ArxAr is a Hadamard 
transform matrix, S € M. NxN is a diagonal matrix with its 
entries randomly chosen from { — 1,1}. Sensing using such 
a measurement matrix can be executed using a fast binary 
algorithm, which makes the hardware implementation simple. 
Note that the BM-MAP-OMP, the BSBL and the Cluss- 
MCMC methods were not included in this experiment due 
to their prohibitive computational complexity when the signal 
dimension is large. Instead, we compare our method with 
some other computationally efficient GAMP-based methods, 
namely, the EM-NNGM-GAMP method [25] and the EM- 
GM-AMP method [26]. These two methods have demonstrated 
state-of-the-art recovery performance in a series of experi¬ 
ments. The spectral projected gradient (SPG) method (referred 
to as SPGL1) which was developed in [27] to efficiently solve 
the basis pursuit or basis pursuit denoising optimizations is 
also included for comparison. Among these methods, only 
the EM-NNGM-GAMP algorithm exploits the non-negativity 
of the satellite image. Therefore the signals recovered by 
these algorithms, except the EM-NNGM-GAMP, may contain 
negative coefficients. These negative coefficients are manually 
set to zero in our simulations. Fig.[4]shows the original satellite 
image and the images reconstructed by respective algorithms, 
where we set M = 0.157V and SNR = 60dB. It can be 
seen that the PCSBL-GAMP offers a significantly better image 
quality as compared with other methods. Fig. [5] plots the nor¬ 
malized mean square errors (NMSEs) of respective algorithms 
vs. the ratio M/N. Results are averaged over 100 independent 
trails, with the sensing matrix A randomly generated for each 
trial. The recovered negative coefficients are kept unaltered in 
calculating the NMSEs. We see that our proposed PCSBL- 
GAMP method outperforms other algorithms by a big margin 

4 Image data is available at http://sourceforge.net/projects/gampmatlab 
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Fig. 3. Original patches of letters and patches reconstructed by respective algorithms. 
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Fig. 4. Original satellite image and images reconstructed by respective algorithms. 


for a small ratio M/N (e.g. M/N < 0.25), where data 
acquisition is practically appealing due to high compression 
rates. 



Fig. 5. Satellite image: NMSEs vs. the ratio M/N. 


C. Background Subtraction 

Background subtraction, also known as foreground detec¬ 
tion, is a technique used to automatically detect and track 
moving objects in videos from static cameras. Usually, the 


foreground innovations are sparse in the spatial image domain. 
By exploiting this sparsity, the sparse foreground innova¬ 
tions within a scene can be reconstructed using compressed 
measurements, which relieves the communications burden 
placed on data transmission [8]. Specifically, the idea is to 
reconstruct the foreground image from the difference between 
the compressed measurements of the background image and 
the compressed measurements of the test image [8] 

yf=Vt-Vb = M x t - x b ) 

=Ax f 

where x t and x b represent the test and the background images, 
respectively; y t and y b denote the compressed measurements 
of the test and background images, respectively; and Xf is 
the foreground image to be recovered. In our experiments, we 
use the Convoy2 data set that was used in [28]. The Convoy2 
data set was collected on the Spesutie island, consisting of a 
video sequence with 260 frames and one background frame 
recorded by a single static camera. The video sequence has 
a dynamic sparse foreground as vehicles enter and exit the 
filed of view over time. We first choose the 40th frame of the 
Convoy2 data set as a test image, which is shown in Fig. [6] The 
background image and the foreground image are also included 
in Fig. [6] The foreground image is regarded as the groundtruth 
image. This foreground image, however, does not have a pure 
background since Xf = x t — x b is not an exactly sparse 
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signal and contains many small nonzero components. In our 
experiments, the original images of 480x381 pixels are resized 
to 120 x 96 pixels. For the resized foreground image, we 
have a total number of 923 coefficients whose magnitudes are 
greater than 10 -2 , thus the percentage of nonzero coefficients 
is 923/(120 x 96) = 8.01%. Again, the BM-MAP-OMP, 
BSBL, and the Cluss-MCMC methods are not included due to 
their prohibitive computational complexity. Here we compare 
our method with the EM-GM-AMP method [26] and the EM- 
BG-AMP method [29]. Fig. |7] depicts images reconstructed 
by respective algorithms, where we use only M = 0.1N 
measurements. The measurement matrix A is randomly gen¬ 
erated with each entry independently drawn from a normal 
distribution. We see that our proposed PCSBL-GAMP method 
provides the finest image quality with a clear appearance of 
the vehicle, whereas the object silhouettes recovered by other 
methods are hardly recognizable. In our next experiments, 
frames from the 10th to 60th are used as test images. For 
each test image, we use M = 0.1 N measurements to recover 
the difference (foreground) image. Fig.[8]shows the NMSEs of 
respective algorithms vs. the frame number, where the NMSEs 
are obtained by averaging over 100 independent runs, with 
the measurement matrix A randomly generated for each run. 
From Fig. [8j we observe that our proposed method presents a 
significant performance advantage over other methods. 

V. Conclusions 

We developed a pattern-coupled sparse Bayesian learning 
method for recovery of two-dimensional block-sparse sig¬ 
nals whose cluster patterns are unknown a priori. A two- 
dimensional pattern-coupled hierarchical Gaussian prior model 
is introduced to characterize and exploit the pattern dependen¬ 
cies among neighboring coefficients. The proposed pattern- 
coupled hierarchical model is effective and flexible to cap¬ 
ture any underlying block-sparse structures, without requiring 
the prior knowledge of the block partition. An expectation- 
maximization (EM) strategy is employed to infer the max¬ 
imum a posterior (MAP) estimate of the hyperparameters, 
along with the posterior distribution of the sparse signal. 
Additionally, the generalized approximate message passing 
(GAMP) algorithm is embedded in the EM framework to effi¬ 
ciently compute an approximation of the posterior distribution 
of hidden variables, which results in a significant reduction 
in computational complexity. Numerical results show that 
our proposed algorithm presents a substantial performance 
advantage over other existing state-of-the-art methods in image 
recovery. 
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